Microsoft and Tsinghua Introduce Reward Reasoning Models to Enhance LLM Judgement with Dynamic Compute Scaling
Microsoft and Tsinghua researchers propose Reward Reasoning Models that adaptively allocate compute resources during evaluation, significantly improving large language model judgment and alignment across complex tasks.